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ABSTRACT. We give an overview of two approaches to probability theory where lower 
and upper probabilities, rather than probabilities, are used: Walley's behavioural theory of 
imprecise probabilities, and Shafer and Vovk's game-theoretic account of probability. We 
show that the two theories are more closely related than would be suspected at first sight, 
and we establish a correspondence between them that (i) has an interesting interpretation, 
and (ii) allows us to freely import results from one theory into the other. Our approach 
leads to an account of probability trees and random processes in the framework of Walley's 
theory. We indicate how our results can be used to reduce the computational complexity of 
dealing with imprecision in probability trees, and we prove an interesting and quite general 
version of the weak law of large numbers. 



1. Introduction 

In recent years, we have witnessed the growth of a number of theories of uncertainty, 
where imprecise (lower and upper) probabilities and previsions, rather than precise (or 
point-valued) probabilities and previsions, have a central part. Here we consider two 
of them, Glenn Shafer and Vladimir Vovk's game-theoretic account of probability [30], 
which is introduced in Section 2, and Peter Walley's behavioural theory [34], outlined in 
Section 3. These seem to have a rather different interpretation, and they certainly have been 
influenced by different schools of thought: Walley follows the tradition of Frank Ramsey 
[22], Bruno de Finetti [11] and Peter Williams [40] in trying to establish a rational model 
for a subject's beliefs in terms of her behaviour. Shafer and Vovk follow an approach 
that has many other influences as well, and is strongly coloured by ideas about gambling 
systems and martingales. They use Cournot's Principle to interpret lower and upper prob- 
abilities (see [29]; and [30, Chapter 2] for a nice historical overview), whereas on Walley's 
approach, lower and upper probabilities are defined in terms of a subject's betting rates. 

What we set out to do here, 1 and in particular in Sections 4 and 5, is to show that in many 
practical situations, the two approaches are strongly connected. 2 This implies that quite a 
few results, valid in one theory, can automatically be converted and reinterpreted in terms 
of the other. Moreover, we shall see that we can develop an account of coherent immediate 
prediction in the context of Walley's behavioural theory, and prove, in Section 6, a weak 
law of large numbers with an intuitively appealing interpretation. We use this weak law in 
Section 7 to suggest a way of scoring a predictive model that satisfies A. Philip Dawid's 
Prequential Principle [5, 6]. 

Why do we believe these results to be important, or even relevant, to AI? Probabilistic 
models are intended to represent an agent's beliefs about the world he is operating in, and 
which describe and even determine the actions he will take in a diversity of situations. 
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'An earlier and condensed version of this paper, with much less discussion and without proofs, was presented 
at the ISIPTA '07 conference [7]. 

2 Our line of reasoning here should be contrasted with the one in [29], where Shafer et al. use the game- 
theoretic framework developed in [30] to construct a theory of predictive upper and lower previsions whose 
interpretation is based on Cournot's Principle. See also the comments near the end of Section 5. 
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Probability theory provides a normative system for reasoning and making decisions in the 
face of uncertainty. Bayesian, or precise, probability models have the property that they 
are completely decisive: a Bayesian agent always has an optimal choice when faced with a 
number of alternatives, whatever his state of information. While many may view this as an 
advantage, it is not always very realistic. Imprecise probability models try to deal with this 
problem by explicitly allowing for indecision, while retaining the normative, or coherentist 
stance of the Bayesian approach. We refer to [8, 34, 35] for discussions about how this can 
be done. 

Imprecise probability models appear in a number of Al-related fields. For instance in 
probabilistic logic: it was already known to George Boole [1] that the result of probabilistic 
inferences may be a set of probabilities (an imprecise probability model), rather than a 
single probability. This is also important for dealing with missing or incomplete data, 
leading to so-called partial identification of probabilities, see for instance [9, 19]. There is 
also a growing literature on so-called credal nets [3, 4]: these are essentially Bayesian nets 
with imprecise conditional probabilities. 

We are convinced that it is mainly the mathematical and computational complexity often 
associated with imprecise probability models that is keeping them from becoming a more 
widely used tool for modelling uncertainty. But we believe that the results reported here 
can help make inroads in reducing this complexity. Indeed, the upshot of our being able 
to connect Walley's approach with Shafer and Vovk's, is twofold. First of all, we can 
develop a theory of imprecise probability trees: probability trees where the transition from 
a node to its children is described by an imprecise probability model in Walley's sense. 
Our results provide the necessary apparatus for making inferences in such trees. And 
because probability trees are so closely related to random processes, this effectively brings 
us into a position to start developing a theory of (event-driven) random processes where 
the uncertainty can be described using imprecise probability models. We illustrate this in 
Examples 1 and 3, and in Section 8. 

Secondly, we are able to prove so-called Marginal Extension results (Theorems 3 and 7, 
Proposition 9), which lead to backwards recursion, and dynamic programming-like meth- 
ods that allow for an exponential reduction in the computational complexity of making 
inferences in such imprecise probability trees. This is also illustrated in Examples 3 and 
Section 8. For (precise) probability trees, similar techniques were described in Shafer's 
book on causal reasoning [27]. They seem to go back to Christiaan Huygens, who drew 
the first probability tree, and showed how to reason with it, in his solution to Pascal and 
Fermat's Problem of Points. 3 

2. Shafer and Vovk's game-theoretic approach to probability 

In their game-theoretic approach to probability [30], Shafer and Vovk consider a game 
with two players, Reality and Sceptic, who play according to a certain protocol. They 
obtain the most interesting results for what they call coherent probability protocols. This 
section is devoted to explaining what this means. 

2.1. Reality's event tree. We begin with a first and basic assumption, dealing with how 
the first player, Reality, plays. 

Gl . Reality makes a number of moves, where the possible next moves may depend on the 
previous moves he has made, but do not in any way depend on the previous moves 
made by Sceptic. 

This means that we can represent his game-play by an event tree (see also [26, 28] for more 
information about event trees). We restrict ourselves here to the discussion of bounded 
protocols, where Reality makes only a finite and bounded number of moves from the be- 
ginning to the end of the game, whatever happens. But we don't exclude the possibility 



3 See Section 8 for more details and precise references. 
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that at some point in the tree, Reality has the choice between an infinite number of next 
moves. We shall come back to these assumptions further on, once we have the appropriate 
notational tools to make them more explicit. 4 



U 




FIGURE 1 . A simple event tree for Reality, displaying the initial situ- 
ation □, other non-terminal situations (such as f) as grey circles, and 
paths, or terminal situations, (such as a>) as black circles. Also depicted 
is a cut U = {111,112,113,114} of □. Observe that t (strictly) precedes u\\ 
t C u\, and that C(t) = {u\,u{\ is the children cut of t. 

Let us establish some terminology related to Reality's event tree. 

2.1.1. Paths, situations and events. A path in the tree represents a possible sequence of 
moves for Reality from the beginning to the end of the game. We denote the set of all 
possible paths © by Q., the sample space of the game. 

A situation t is some connected segment of a path that is initial, i.e., starts at the root 
of the tree. It identifies the moves Reality has made up to a certain point, and it can be 
identified with a node in the tree. We denote the set of all situations by Dfi. It includes 
the set Q. of terminal situations, which can be identified with paths. All other situations 
are called non-terminal; among them is the initial situation □, which represents the empty 
initial segment. See Fig. 1 for a simple graphical example explaining these notions. 

If for two situations s and t , s is a(n initial) segment of t , then we say that s precedes t 
or that t follows s, and write s C t , or alternatively t □ s. If (0 is a path and t C a> then we 
say that the path (0 goes through situation t . We write s\zt, and say that s strictly precedes 
t, if s C t and s^t. 

An event A is a set of paths, or in other words, a subset of the sample space: ACQ.. 
With an event A, we can associate its indicator I a, which is the real-valued map on Q. that 
assumes the value 1 on A, and elsewhere. 

We denote by ]t := {to e il: t C ©} the set of all paths that go through t: ]t is the 
event that corresponds to Reality getting to a situation t . It is clear that not all events will 
be of the type ]t. Shafer [27] calls events of this type exact. Further on, in Section 4, exact 
events will be the only events that can be legitimately conditioned on, because they are the 
only events that can be foreseen may occur as part of Reality's game-play. 

2.1.2. Cuts of a situation. Call a cut U of a situation t any set of situations that follow f, 
and such that for all paths © through t, there is a unique u e U that (0 goes through. In 
other words: 

(i) (Vi«et/)(K=U); and 

(ii) (Vfl)30(3!MGf/)(ffl3«); 



4 Essentially, the width of the tree may be infinite, but its depth should be finite. 
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see also Fig. 1. Alternatively, a set U of situations is a cut of t if and only if the corre- 
sponding set {\u : u G U} of exact events is a partition of the exact event ]t. A cut can be 
interpreted as a (complete) stopping time. 

If a situation s □ t precedes (follows) some element of a cut U of t, then we say that 
s precedes (follows) U, and we write s \—U (s 3 £/)• Similarly for 'strictly precedes (fol- 
lows)'. For two cuts U and V of t, we say that t/ precedes V if each element of U is 
followed by some element of V. 

A child of a non-terminal situation t is a situation that immediately follows it. The set 
C(t ) of children of f constitutes a cut of t , called its children cut. Also, the set £2 of terminal 
situations is a cut of □, called its terminal cut. The event |f is the corresponding terminal 
cut of a situation t . 

2.1.3. Reality 's move spaces. We call a move w for Reality in a non-terminal situation t an 
arc that connects t with one of its children s G C(t ), meaning that 5 = t w is the concatenation 
of the segment t and the arc w. See Fig. 2. 




Figure 2. An event tree for Reality, with the move space W r and the 
corresponding children cut C(t) of a non-terminal situation t. 

Reality's move space in t is the set W r of those moves w that Reality can make in V. 
W f = {w: fw e C(t)}. We have already mentioned that W r maybe (countably or uncount- 
ably) infinite: there may be situations where reality has the choice between an infinity of 
next moves. But every W r should contain at least two elements: otherwise there is no 
choice for Reality to make in situation t . 

2.2. Processes and variables. We now have all the necessary tools to represent Reality's 
game-play. This game-play can be seen as a basis for an event-driven, rather than a time- 
driven, account of a theory of uncertain, or random, processes. The driving events are, 
of course, the moves that Reality makes. 5 In a theory of processes, we generally con- 
sider things that depend on (the succession of) these moves. This leads to the following 
definitions. 

Any (partial) function on the set of situations £2^ is called a process, and any process 
whose domain includes all situations that follow a situation t is called a t-process. Of 
course, a f-process is also an ^-process for all s □ t; when we call it an s-process, this 
means that we are restricting our attention to its values in all situations that follow s. 

A special example of a f-process is the distance d(t,-) which for any situation s □ t 
returns the number of steps d(t,s) along the tree from t to s. When we said before that we 
are only considering bounded protocols, we meant that there is a natural number D such 
that d(t,s) < D for all situations t and all s □ t . 

Similarly, any (partial) function on the set of paths Q. is called a variable, and any 
variable on £2 whose domain includes all paths that go through a situation t is called a 



These so-called Humean events shouldn't be confused with the Moivrean events we have considered before, 
and which are subsets of the sample space CI. See Shafer [27, Chapter 1] for terminology and more explanation. 
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t-variable. If we restrict a f-process & to the set \t of all terminal situations that follow f, 
we obtain a f-variable, which we denote by J^o. 

If U is a cut of t, then we call a f-variable g U -measurable if for all u in U, g assumes 
the same value g(u) := g((0) for all paths (0 that go through u. In that case we can also 
consider g as a variable on U, which we denote as gy. 

If & is a f-process, then with any cut U of f we can associate a f-variable =^7, which 
assumes the same value J^£/((») := ^(m) in all 0) that follow u e U. This f-variable is 
clearly U -measurable, and can be considered as a variable on U. This notation is consistent 
with the notation J^q introduced earlier. 

Similarly, we can associate with & a new, V 'stopped, f-process U(^), as follows: 



' &{s) if tQsQU 
&(u) if u e U and u C s. 



The f-variable U(&)a is {/-measurable, and is actually equal to ^iy: 

U(&) a = &u- (1) 
The following intuitive example will clarify these notions. 

Example 1 (Flipping coins). Consider nipping two coins, one after the other. This leads 
to the event tree depicted in Fig. 3. The identifying labels for the situations should be 
intuitively clear: e.g., in the initial situation '□ =?, ?' none of the coins have been flipped, 
in the non-terminal situation 'h, ?' the first coin has landed 'heads' and the second coin 
hasn't been nipped yet, and in the terminal situation 'f ,f' both coins have been flipped and 
have landed 'tails' . 




Figure 3. The event tree associated with two successive coin flips. 
Also depicted are two cuts, X\ and U, of the initial situation. 



First, consider the real process Jf , which in each situation s, returns the number jV{s) 
of heads obtained so far, e.g., =yK(?,?) = and J/(h,T) — 1. If we restrict the process 
,j¥ to the set Q. of all terminal elements, we get a real variable <yfo, whose values are: 
jT a {h,h) = 2, jr a {h,i) = Jfak ,h) = 1 and JT a {t,t) = 0. 

Consider the cut U of the initial situation, which corresponds to the following stopping 
time: "stop after two flips, or as soon as an outcome is heads"; see Fig. 3. The values of 
the corresponding variable <Au are given by: jVu(h,h) — jYv{h,i) = 1, jVu{t,h) = 1 and 
jYu{t,t) = 0. So jYu is {/-measurable, and can therefore be considered as a map on the 
elements h, ? and t,h and f,f of U, with in particular J/y{h, ?) = 1. 

Next, consider the processes J 5 ", & l , : £2° ^ {/i,f, ?}, defined as follows: 
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&\s) 
&\s) 



s 



?,? h,l t,l h,h h,t t,h t,t 
? h t h t h t 



? h t h h t t 
? ? ? h t h t 



& returns the outcome of the latest, the outcome of the first, and J? that of the second 
coin flip. The associated variables and give, in each element of the sample space, 
the respective outcomes of the first and second coin flips. 

The variable is X 1 -measurable: as soon as we reach (any situation on) the cut X\, its 
value is completely determined, i.e., we know the outcome of the first coin flip; see Fig. 3 
for the definition of X 1 . 

We can associate with the process & the variable .^ x i that is also X 1 -measurable: it 
returns, in any element of the sample space, the outcome of the first coin flip. Alternatively, 
we can stop the process & after one coin flip, which leads to the X 1 -stopped process 
X l {^). This new process is of course equal to and for the corresponding variable 
J^, we have that X 1 (,^) n = = & x \ ; also see Eq. (1). ♦ 

2.3. Sceptic's game-play. We now turn to the other player, Sceptic. His possible moves 
may well depend on the previous moves that Reality has made, in the following sense. 
In each non-terminal situation f, he has some set S ( of moves s available to him, called 
Sceptic's move space in t . We make the following assumption: 

G2. In each non-terminal situation f, there is a (positive or negative) gain for Sceptic as- 
sociated with each of the possible moves s in S r that Sceptic can make. This gain 
depends only on the situation t and the next move w that Reality will make. 

This means that for each non-terminal situation t there is a gain function \ : S t x W r — > R, 

such that Af(s,w) represents the change in Sceptic's capital in situation t when he makes 

move s and Reality makes move w. 

2.3.1. Strategies and capital processes. Let us introduce some further notions and termi- 
nology related to Sceptic's game-play. A strategy & for Sceptic is a partial process defined 
on the set £2^ \Q of non-terminal situations, such that £P(t) € S t is the corresponding move 
that Sceptic will make in each non-terminal situation t. 

With each such strategy 2P there corresponds a capital process , whose value in 
each situation t gives us Sceptic's capital accumulated so far, when he starts out with zero 
capital in □ and plays according to the strategy It is given by the recursion relation 



with initial condition JT^(D) = 0. Of course, when Sceptic starts out (in □) with capital 
a and uses strategy his corresponding accumulated capital is given by the process 
a + J^^. In the terminal situations, his accumulated capital is then given by the real 
variable a + J^f . 

If we start in a non-terminal situation t, rather than in □, then we can consider t- 
strategies that tell Sceptic how to move starting from t onwards, and the corresponding 
capital process is then also a f-process, that tells us how much capital Sceptic has 
accumulated since starting with zero capital in situation t and using t -strategy ZP. 

2.3.2. Lower and upper prices. The assumptions Gl and G2 outlined above determine so- 
called gambling protocols. They are sufficient for us to be able to define lower and upper 
prices for real variables. 

Consider a non-terminal situation t and a real t -variable /. The upper price E f (/) for f 
in t is defined as the infimum capital a that Sceptic has to start out with in t in order that 
there would be some f-strategy 2P such that his accumulated capital a + J(f^ allows him, 
at the end of the game, to hedge /, whatever moves Reality makes after t : 



J?T^(;w) = J^(0 + M^(0>w), weW, 




(2) 
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where a + J£^f > / is taken to mean that a + J(f" ? {co) > f(co) for all terminal situations 
(0 that go through t. Similarly, for the lower price E;(/) for f in t: 

£,(/):= sup {a: a - J^f <f for some t -strategy (3) 

so Ej(/) = — E f (— /). If we start from the initial situation t = □, we simply get the upper 
and lower prices for a real variable /, which we also denote by E(/) and E(/). 

2.3.3. Coherent probability protocols. Requirements Gl and G2 for gambling protocols 
allow the moves, move spaces and gain functions for Sceptic to be just about anything. We 
now impose further conditions on Sceptic's move spaces. 

A gambling protocol is called a probability protocol when besides Gl and G2, two more 
requirements are satisfied. 

PI. For each non-terminal situation t, Sceptic's move space S r is a convex cone in some 
linear space: aiSi + fl2S2 S S t for all non-negative real numbers a\ and «2 and all si 
and S2 in S r . 

P2. For each non-terminal situation f, Sceptic's gain function A, has the following linear- 
ity property: Xt{a\S\ +a2S2>w) = aiAf(si,w) + 02^(82, w) for all non-negative real 
numbers a\ and 02, all Si and S2 in S t and all w in W f . 

Finally, a probability protocol is called coherent 6 when moreover: 

C. For each non-terminal situation t , and for each s in S, there is some w in W f such that 
Ms,w) <o. 

It is clear what this last requirement means: in each non-terminal situation, Reality has a 
strategy for playing from t onwards such that Sceptic can't (strictly) increase his capital 
from / onwards, whatever f-strategy he might use. 

For such coherent probability protocols, Shafer and Vovk prove a number of interesting 
properties for the corresponding lower (and upper) prices. We list a number of them here. 
For any real f-variable /, we can associate with a cut U of t another special f/-measurable 
f-variable Ey by Ey (/)(©) = E M (/), for all paths CO through t, where u is the unique 
situation in U that CO goes through. For any two real f-variables fa and fa, fa < fa is taken 
to mean that fa((0) < fa((0) for all paths CO that go through t. 

Proposition 1 (Properties of lower and upper prices in a coherent probability protocol 
[30]). Consider a coherent probability protocol, let t be a non-terminal situation, f, fa 
and fa real t-variables, and U a cut of t. Then 

1. mf ae]t f{co) < Et(f) < %(/) < sup fflGT( /(0)) [convexity]; 

2. Mfi +fi) > Mfi)+Mfi) [mper-additivityl; 

3. Ef (Xf) = XE, (f) for all real X>0 [non-negative homogeneity]; 
4- Ej (/ + a) = E, (/) + CX for all real a [constant additivity ]; 

5. Ej(oc) = a for all real a [normalisation]; 

6- fa < fa implies thatE t (fa) <Mt(f2) [monotonicity]; 

7- Ej(/) = Mt(Mu(f)) [l aw of iterated expectation]. 

What is more, Shafer and Vovk use specific instances of such coherent probability pro- 
tocols to prove various limit theorems (such as the law of large numbers, the central limit 
theorem, the law of the iterated logarithm), from which they can derive, as special cases, 
the well-known measure-theoretic versions. We shall come back to this in Section 6. 

The game-theoretic account of probability we have described so far, is very general. 
But it seems to pay little or no attention to beliefs that Sceptic, or other, perhaps additional 
players in these games might entertain about how Reality will move through its event tree. 
This might seem strange, because at least according to the personalist and epistemicist 



^For a discussion of the use of 'coherent' here, we refer to [29, Appendix C]. 
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school, probability is all about beliefs. In order to find out how we can incorporate beliefs 
into the game-theoretic framework, we now turn to Walley's imprecise probability models. 

3. Walley's behavioural approach to probability 

In his book on the behavioural theory of imprecise probabilities [34], Walley considers 
many different types of related uncertainty models. We shall restrict ourselves here to the 
most general and most powerful one, which also turns out to be the easiest to explain, 
namely coherent sets of really desirable gambles; see also [36]. 

Consider a non-empty set £2 of possible alternatives (0, only one of which actually 
obtains (or will obtain); we assume that it is possible, at least in principle, to determine 
which alternative does so. Also consider a subject who is uncertain about which possible 
alternative actually obtains (or will obtain). A gamble on £2 is a real- valued map on £2, and 
it is interpreted as an uncertain reward, expressed in units of some predetermined linear 
utility scale: if o actually obtains, then the reward is /(©), which may be positive or 
negative. We use the notation (£2) for the set of all gambles on £2. Walley [34] assumes 
gambles to be bounded. We make no such boundedness assumption here. 7 

If a subject accepts a gamble /, this is taken to mean that she is willing to engage in the 
transaction where, (i) first it is determined which (0 obtains, and (ii) then she receives the 
reward /(©). We can try and model the subject's beliefs about £2 by considering which 
gambles she accepts. 

3.1. Coherent sets of really desirable gambles. Suppose our subject specifies some set 
8t of gambles she accepts, called a set of really desirable gambles. Such a set is called 
coherent if it satisfies the following rationality requirements: 
Dl. if / < then f $.8% [avoiding partial loss]; 
D2. if / > then / G & [accepting partial gain]; 

D3. if fi and fa belong to 8% then their (point- wise) sum f\ + fa also belongs to 8% [com- 
bination]; 

D4. if / belongs to 8i then its (point-wise) scalar product X f also belongs to 8i for all 
non-negative real numbers X [scaling]. 

Here '/ < 0' means '/ < and not / — 0'. Walley has also argued that, besides D1-D4, 
sets of really desirable gambles should satisfy an additional axiom: 

D5. 8$ is ^-conglomerable for any partition 88 of £2: if I B f G 8? for all B G 88, then also 

/ G 8$ [full conglomerability] . 
When the set £2 is finite, all its partitions are finite too, and therefore full conglomerabil- 
ity becomes a direct consequence of the finitary combination axiom D3. But when £2 is 
infinite, its partitions may be infinite too, and then full conglomerability is a very strong 
additional requirement, that is not without controversy. If a model 8i is ^-conglomerable, 
this means that certain inconsistency problems when conditioning on elements B of 88 are 
avoided; see [34] for more details and examples. Conglomerability of belief models wasn't 
required by forerunners of Walley, such as Williams [40], 8 or de Finetti [11]. While we 
agree with Walley that conglomerability is a desirable property for sets of really desirable 
gambles, we do not believe that full conglomerability is always necessary: it seems that 
we only need to require conglomerability with respect to those partitions that we actually 
intend to condition our model on. 9 This is the path we shall follow in Section 4. 



The concept of a really desirable gamble (at least formally) allows for such a generalisation, because the 
coherence axioms for real desirability nowhere hinge on such a boundedness assumption, at least not from a 
technical mathematical point of view. 

^Axioms related to (D1)-(D4), but not (D5), were actually suggested by Williams for bounded gambles. But 
it seems that we need at least some weaker form of (D5), namely the cut conglomerability (D5') considered 
further on, to derive our main results: Theorems 3 and 6. 

9 The view expressed here seems related to Shafer's, as sketched near the end of [25, Appendix 1]. 
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3.2. Conditional lower and upper previsions. Given a coherent set of really desirable 
gambles, we can define conditional lower and upper previsions as follows: for any gamble 
/ and any non-empty subset B of £2, with indicator Ib, 



so P(f\B) = —P(—f\B), and the lower prevision P(f\B) of f, conditional on B is the 
supremum price a for which the subject will buy the gamble /, i.e., accept the gamble 
/ — a, contingent on the occurrence of B. Similarly, the upper prevision P{f\B) of f, 
conditional on B is the infimum price a for which the subject will sell the gamble /, i.e., 
accept the gamble OC — f, contingent on the occurrence of B. 

For any event A, we define the conditional lower probability P(A\B) :=P(Ia\B), i.e., the 
subject's supremum rate for betting on the event A, contingent on the occurrence of B, and 
similarly for P(A\B) :=P(I A \B). 

We want to stress here that by its definition [Eq. (5)], P{f\B) is a conditional lower 
prevision on what Walley [34, Section 6.1] has called the contingent interpretation: it is 
a supremum acceptable price for buying the gamble / contingent on the occurrence of B, 
meaning that the subject accepts the contingent gambles Is(f — P(f\B) + e), £ > 0, which 
are called off unless B occurs. This should be contrasted with the updating interpretation 
for the conditional lower prevision P(f\B), which is a subject's present (before the occur- 
rence of B) supremum acceptable price for buying / after receiving the information that B 
has occurred (and nothing else!). Walley's Updating Principle [34, Section 6.1.6], which 
we shall accept, and use further on in Section 4, (essentially) states that conditional lower 
previsions should be the same on both interpretations. There is also a third way of looking 
at a conditional lower prevision P(f\B), which we shall call the dynamic interpretation, 
and where P(f\B) stands for the subject's supremum acceptable buying price for / after 
she gets to know B has occurred. For precise conditional previsions, this last interpretation 
seems to be the one considered in [13, 23, 24, 29]. It is far from obvious that there should 
be a relation between the first two and the third interpretations. 10 We shall briefly come 
back to this distinction in the following sections. 

For any partition SS of £2, we let P{f\B8) := Y.BefiJ I BP(f\B) be the gamble on £2 that in 
any element o of B assumes the value P(f\B), where B is any element of 3&. 

The following properties of conditional lower and upper previsions associated with a 
coherent set of really desirable bounded gambles were (essentially) proved by Walley [34], 
and by Williams [40]. We give the extension to potentially unbounded gambles: 

Proposition 2 (Properties of conditional lower and upper previsions [34]). Consider a 
coherent set of really desirable gambles 8%, let B be any non-empty subset of £2, and let f, 
fx and fi be gambles on £2. Then 11 

1. irf KS /((») < P(f\B) < P(f\B) < sn P(0€B f(co) [convexity]; 
2- P{f\ +fi\B) > P(fi \B)+P(f 2 \B) [super-additivity]; 

3. P(Xf\B) = XP_{f\B) for all real X > [non-negative homogeneity]; 

4. P(f+a\B) —P(f\B) + a for all real a [constant additivity]; 

5. P(a\B) = a for all real a [normalisation]; 

6. fi < fi implies that P(f\ \B) < P_{f2\B) [monotonicity]; 

7. if 2% is any partition of £2 that refines the partition {B,B C } and M is SS-conglomerable, 
thenP{f\B) > P(P(f\@)\B) [conglomerative property]. 



In [29], the authors seem to confuse the updating interpretation with the dynamic interpretation when they 
claim that "[their new understanding of lower and upper previsions] justifies Peter Walley's updating principle". 

^Here, as in Proposition 1, we implicitly assume that whatever we write down is well-defined, meaning that 
for instance no sums of — °o and +°° appear, and that the function P(f\&) is real-valued, and nowhere infinite. 
Shafer and Vovk don't seem to mention the need for this. 



P{f\B):=wf{a:I B {a-f)ea} 
P(f\B):= sup {a: I B {f-a)eM}, 



(4) 
(5) 
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The analogy between Propositions 1 and 2 is striking, even if there is an equality in 
Proposition 1.7, where we have only an inequality in Proposition 2.7. 12 In the next section, 
we set out to identify the exact correspondence between the two models. We shall find a 
specific situation where applying Walley's theory leads to equalities rather than the more 
general inequalities of Proposition 2.7. 13 

We now show that there can indeed be a strict inequality in Proposition 2.7. 

Example 2. Consider an urn with red, green and blue balls, from which a ball will be 
drawn at random. Our subject is uncertain about the colour of this ball, so Q. = {r,g,b}. 
Assume that she assesses that she is willing to bet on this colour being red at rates up 
to (and including) 1/4, i.e., that she accepts the gamble — 1/4. Similarly for the other 
two colours, so she also accepts the gambles 1^ — 1/4 and T^j — 1/4. It is not difficult to 
prove using the coherence requirements D1-D4 and Eq. (5) that the smallest coherent set 
of really desirable gambles & that includes these assessments satisfies / 6 3% 4=> P(f) > 0, 
where 

H/) .3 / M+ /fe) + / W + l mi|i{/(r)/(8)/w} 

For the partition S3 = {b, {r,g}} (a Daltonist has observed the colour of the ball and tells 
the subject about it), it follows from Eq. (5) after some manipulations that 

P(f\{b})=f{b) andP(/|{r,*}) = \ f -^^l + 1 - m i n {f(r),f(g)}. 

If we consider / = I{ g y, then in particular P({g}\{b}) = and = so 

P{{g}\^) = V 3/ {r,g} and therefore 

3 1/3 + 1/3 1 1 

p{p({g}\m = |— ^+40=6' 

whereas P{{g}) = l /4, and therefore P({g}) > P(P({g}\S§)). ♦ 

The difference P(f\B) — P(f\B) between infimum selling and supremum buying prices 
for gambles / represents imprecision present in our subject's belief model. If we look at 
the inequalities in Proposition 2.1, we are led to consider two extreme cases. One extreme 
maximises the 'degrees of imprecision' P(f\B) — P(f\B) by letting P(f\B) = inf ae B f{to) 
and P(f\B) = sup (oeB f((o). This leads to the so-called vacuous model, corresponding to 
S? = {/: f > 0}, and intended to represent complete ignorance on the subject's part. 

The other extreme minimises the degrees of imprecision P(f\B) — P(f\B) by letting 
P{f\B) = P(f\B) everywhere. The common value P{f\B) is then called the prevision, or 
fair price, for / conditional on B. We call the corresponding functional P(-\B) a (condi- 
tional) linear prevision. Linear previsions are the precise probability models considered by 
de Finetti [11]. They of course have all properties of lower and upper previsions listed in 
Proposition 2, with equality rather than inequality for statements 2 and 7. The restriction 
of a linear prevision to (indicators of) events is a finitely additive probability measure. 

4. Connecting the two approaches 

In order to lay bare the connections between the game-theoretic and the behavioural ap- 
proach, we enter Shafer and Vovk's world, and consider another player, called Forecaster, 
who, in situation □, has certain piece-wise beliefs about what moves Reality will make. 



1 2 

Concatenation inequalities for lower prices do appear in the more general context described in [29]. 

This seems to happen generally for what is called marginal extension in a situation of immediate prediction, 
meaning that we start out with, and extend, an initial model where we condition on increasingly finer partitions, 
and where the initial conditional model for any partition deals with gambles that are measurable with respect to 
the finer partitions; see [34, Theorem 6.7.2] and [20]. 
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4. 1 . Forecaster's local beliefs. More specifically, for each non-terminal situation t e £2* \ 
£2, she has beliefs (in situation □) about which move w Reality will choose from the set 
W f of moves available to him if he gets to t. We suppose she represents those beliefs 
in the form of a coherent 14 set M t of really desirable gambles on W r . These beliefs are 
conditional on the updating interpretation, in the sense that they represent Forecaster's 
beliefs in situation □ about what Reality will do immediately after he gets to situation t. 
We call any specification of such coherent 3?,, t £ £2^ \ £2, an immediate prediction model 
for Forecaster. We want to stress here that 8% t should not be interpreted dynamically, i.e., 
as a set of gambles on W, that Forecaster accepts in situation t. 

We shall generally call an event tree, provided with local predictive belief models in 
each of the non-terminal situations f, an imprecise probability tree. These local belief 
models may be coherent sets of really desirable gambles £% t . But they can also be lower 
previsions (perhaps derived from such sets & t ). When all such local belief models are 
precise previsions, or equivalently (finitely additive) probability measures, we simply get 
a probability tree in Shafer's [27, Chapter 3] sense. 

4.2. From local to global beliefs. We can now ask ourselves what the behavioural im- 
plications of these conditional assessments 8% t in the immediate prediction model are. For 
instance, what do they tell us about whether or not Forecaster should accept certain gam- 
bles 15 on £2, the set of possible paths for Reality? In other words, how can these beliefs (in 
□) about which next move Reality will make in each non-terminal situation t be combined 
coherently into beliefs (in □) about Reality's complete sequence of moves? 

In order to investigate this, we use Walley's very general and powerful method of nat- 
ural extension, which is just conservative coherent reasoning. We shall construct, using 
the local pieces of information M t , a set of really desirable gambles on £2 for Forecaster in 
situation □ that is (i) coherent, and (ii) as small as possible, meaning that no more gambles 
should be accepted than is actually required by coherence. 

4.2.1. Collecting the pieces. Consider any non-terminal situation t e £2^ \ £2 and any gam- 
ble h, in £% t . With h, we can associate a f-gamble, 16 also denoted by h t , and defined by 

h t {(0) :=h,{(0(t)) 

for all CO □ t, where we denote by o(f) the unique element of W, such that tco(t) C CO. The 
f-gamble h t is {/-measurable for any cut U of t that is non-trivial, i.e., such that U ^ {?}. 
This implies that we can interpret h, as a map on U. In fact, we shall even go further, 
and associate with the gamble h t on W, a t -process, also denoted by h t , by letting h,(s) := 
h t (co(t)) for any s □ t, where CO is any terminal situation that follows s; see also Fig. 4. 

Iph t represents the gamble on £2 that is called off unless Reality ends up in situation t, 
and which, when it isn't called off, depends only on Reality's move immediately after t , 
and gives the same value h t (w) to all paths ft) that go through t w. The fact that Forecaster, 
in situation □, accepts h t on W r conditional on Reality's getting to t, translates immediately 
to the fact that Forecaster accepts the contingent gamble Iyh t on £2, by Walley's Updating 
Principle. We thus end up with a set 

(J {lyh,:h t £M t } 

tea<>\a 

of gambles on £2 that Forecaster accepts in situation □. 

The only thing left to do now, is to find the smallest coherent set <§,% of really desirable 
gambles that includes 8% (if indeed there is any such coherent set). Here we take coher- 
ence to refer to conditions D1-D4, together with D5', a variation on D5 which refers to 

^Since we don't immediately envisage conditioning this local model on subsets of W ( , we impose no extra 
conglomerability requirements here, only the coherence conditions D1-D4. 
15 In Shafer and Vovk's language, gambles are real variables. 

16 Just as for variables, we can define a f-gamble as a partial gamble whose domain includes \t. 



12 



GERT DE COOMAN AND FILIP HERMANS 




h t (w 2 ) 

FIGURE 4. In a non-terminal situation t, we consider a gamble h t on Re- 
ality's move space W, that Forecaster accepts, and turn it into a process, 
also denoted by h t . The values h t (s) in situations s □ t are indicated by 
curly arrows. 

conglomerability with respect to those partitions that we actually intend to condition on, 
as suggested in Section 3. 

4.2.2. Cut conglomerability. These partitions are what we call cut partitions. Consider 
any cut U of the initial situation □. The set of events 8$u := {\u : u G U} is a partition of 
£1, called the U '-partition. D5' requires that our set of really desirable gambles should be 
cut conglomerable, i.e., conglomerate with respect to every cut partition ,^%. 17 

Why do we only require conglomerability for cut partitions? Simply because we are 
interested in predictive inference: we eventually will want to find out about the gambles 
on £2 that Forecaster accepts in situation □, conditional (contingent) on Reality getting to 
a situation t . This is related to finding lower previsions for Forecaster conditional on the 
corresponding events ]t. A collection {\t: t G T} of such events constitutes a partition of 
the sample space £2 if and only if T is a cut of □. 

Because we require cut conglomerability, it follows in particular that <%> will contain 
the sums of gambles g := Y^ueuhuhu f° r a H non-terminal cuts U of □ and all choices 
of h u G u G U. This is because I^ u g = I\ u h u G £% for all «£(/. Because moreover 
Sag should be a convex cone [by D3 and D4], any sum of such sums Y,ueuhuh u over a 
finite number of non-terminal cuts U should also belong to Sag. But, since in the case of 
bounded protocols we are discussing here, Reality can only make a bounded and finite 
number of moves, £2^ \ £2 is a finite union of such non-terminal cuts, and therefore the 
sums L MG n^\a^T«^« should belong to Sag for all choices h u G M u , u G OP \ 

4.2.3. Selections and gamble processes. Consider any non-terminal situation f, and call 
t-selection any partial process defined on the non-terminal s 3 t such that 5f{s) G 3% s . 
With a f-selection & , we associate a r-process £f called a gamble process, where 

V y {s) = I y{u){s) (6) 

in all situations s □ t; see also Fig. 5. Alternatively, Sf^ is given by the recursion relation 

<g^ (aw) = <s y (s) + y(s) (w) , we W s 

i n 

Again, when all of Reality's move spaces W, are finite, cut conglomerability (D5') is a consequence of 
D3, and therefore needs no extra attention. But when some or all move spaces are infinite, then a cut U may 
contain an infinite number of elements, and the corresponding cut partition SBy will then be infinite too, making 
cut conglomerability a non-trivial additional requirement. 
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for all non-terminal s □ t, with initial value ^^(t) = 0. In particular, this leads to the 
f -gamble defined on all terminal situations co that follow t , by letting 

&a = E hu^{u). (7) 

Then we have just argued that the gambles should belong to Sag for all non-terminal 
situations t and all f-selections 5?. As before for strategy and capital processes, we call a 
□-selection 5? simply a selection, and a D-gamble process simply a gamble process. 




Figure 5. The t -selection 5? in this event tree is a process defined in 
the two non-terminal situations t and s; it selects, in each of these situa- 
tions, a really desirable gamble for Forecaster. The values of the corre- 
sponding gamble process Sf " are indicated by curly arrows. 



4.2.4. The Marginal Extension Theorem. It is now but a technical step to prove Theo- 
rem 3 below. It is a significant generalisation, in terms of sets of really desirable gambles 
rather than coherent lower previsions, 18 of the Marginal Extension Theorem first proved by 
Walley [34, Theorem 6.7.2], and subsequently extended by De Cooman and Miranda [20]. 

Theorem 3 (Marginal Extension Theorem). There is a smallest set of gambles that satisfies 
D1-D4 and D5 ' and includes M. This natural extension of & is given by 

\g'- g> for some selection .Y^. 

Moreover, for any non-terminal situation t and any t-gamble g, it holds that Iyg £ Sgg if 
and only if there is some t-selection 5? t such that g > & a ' , where as before, g > Sf^ ' is 
taken to mean that g((0)> Sf n ' (fi)) for all terminal situations CO that follow t. 

4.3. Predictive lower and upper previsions. We now use the coherent set of really de- 
sirable gambles to define special lower previsions P(-\t) := P(-|T0 for Forecaster in 
situation □, conditional on an event |f, i.e., on Reality getting to situation f, as explained 
in Section 3. 19 We shall call such conditional lower previsions predictive lower previsions. 
We then get, using Eq. (5) and Theorem 3, that for any non-terminal situation t , 

P(/|f):=sup{a:/ T( (/-a)e<%} (8) 

= sup ja : f — a> for some f-selection =5^ j. (9) 

We also use the notation P(f) := P(f\D) = sup {a: f — a G It should be stressed 
that Eq. (8) is also valid in terminal situations t, whereas Eq. (9) clearly isn't. 

Besides the properties in Proposition 2, which hold in general for conditional lower and 
upper previsions, the predictive lower (and upper) previsions we consider here also satisfy 
a number of additional properties, listed in Propositions 4 and 5. 



The difference in language may obscure that this is indeed a generalisation. But see Theorem 7 for expres- 
sions in terms of predictive lower previsions that should make the connection much clearer. 

19 We stress again that these are conditional lower previsions on the contingent/updating interpretation. 
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Proposition 4 (Additional properties of predictive lower and upper previsions). Let t be 

any situation, and let f, f\ and fi be gambles on £1 

1. If t is a terminal situation (0, then P_{f\(o) = P(f\(o) = f((o); 

2- P(f\t) = P(fh\t) andP{f\t) = P(// tf |0; 

3. f\ < fi (on It) implies that P_(f\ \t) < E.{fi\t) [monotonicity]. 

Before we go on, there is an important point that must be stressed and clarified. It is an 
immediate consequence of Proposition 4.2 that when / and g are any two gambles that 
coincide on j£, then P(f\t) = P_(g\t). This means that P(f\t) is completely determined by 
the values that / assumes on |/, and it allows us to define P(-\t) on gambles that are only 
necessarily defined on j?, i.e., on f-gambles. We shall do so freely in what follows. 

For any cut U of a situation t, we may define the f-gamble P(f\U) as the gamble that 
assumes the value P{f\u) in any (0 □ u, where u <EU . This f-gamble is JZ-measurable by 
construction, and it can be considered as a gamble on U. 

Proposition 5 (Separate coherence). Let t be any situation, let U be any cut of t, and let 

f and g be t-gambles, where g is U -measurable. 

1. P{]t\t) = 1; 

2. P(g\U) = gu ; 

3- P(f + g\U)= gu +P(f\U); 

4. if g is moreover non-negative, then P(gf\U) = guE.{f\U). 

4.4. Correspondence between immediate prediction models and coherent probabil- 
ity protocols. There appears to be a close correspondence between the expressions [such 
as (3)] for lower prices E,(/) associated with coherent probability protocols and those 
[such as (9)] for the predictive lower previsions P(f\t) based on an immediate prediction 
model. Say that a given coherent probability protocol and given immediate prediction 
model match whenever they lead to identical corresponding lower prices and predictive 
lower previsions P(-\t) for all non-terminal t g£2^\£2. 

The following theorem marks the culmination of our search for the correspondence 
between Walley's, and Shafer and Vovk's approaches to probability theory. 

Theorem 6 (Matching Theorem). For every coherent probability protocol there is an im- 
mediate prediction model such that the two match, and conversely, for every immediate 
prediction model there is a coherent probability protocol such that the two match. 

The ideas underlying the proof of this theorem should be clear. If we have a coherent 
probability protocol with move spaces S t and gain functions \ for Sceptic, define the 
immediate prediction model for Forecaster to be (essentially) 3% t '■= {— & (s, ■) : s £ S^}. If, 
conversely, we have an immediate prediction model for Forecaster consisting of the sets & t , 
define the move spaces for Sceptic by S ( := and his gain functions by Xt(h, ■) := —h for 
all h in&t . We discuss the interpretation of this correspondence in more detail in Section 5. 

4.5. Calculating predictive lower prevision using backwards recursion. The Marginal 
Extension Theorem allows us to calculate the most conservative global belief model $@ 
that corresponds to the local immediate prediction models 3% t . Here beliefs are expressed 
in terms of sets of really desirable gambles. Can we derive a result that allows us to do 
something similar for the corresponding lower previsions? 

To see what this question entails, first consider a local model 8% s : a set of really desirable 
gambles on W s , where s e £2^ \ £2. Using Eq. (5), we can associate with 3% s a lower 
prevision P s on Sf (W s ). Each gamble g s on W. v can be seen as an uncertain reward, whose 
outcome g s (w) depends on the (unknown) move w £ W s that Reality will make if it gets 
to situation s. And Forecaster's local (predictive) lower prevision 

Psigs) := sup {a : g, - a e ^ s } (10) 

for g s is her supremum acceptable price (in □) for buying g s when Reality gets to s. 
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But as we have seen in Section 4.3, we can also, in each situation f, derive global 
predictive lower previsions P(-\t) for Forecaster from the global model <%, using Eq. (8). 
For each f-gamble /, P(f\t) is Forecaster inferred supremum acceptable price (in □) for 
buying /, contingent on Reality getting to t . 

Is there a way to construct the global predictive lower previsions P(-\t) directly from the 
local predictive lower previsions P 5 ? We can infer that there is from the following theorem, 
together with Propositions 8 and 9 below. 

Theorem 7 (Concatenation Formula). Consider any two cuts U and V of a situation t such 
that U precedes V. For all t -gambles f on £l, 20 



To make clear what the following Proposition 8 implies, consider any f-selection y ', and 
define the U -called off t-selection y v as the selection that mimics y until we get to 
U, where we begin to select the zero gambles: for any non-terminal situation s □ f, let 
y u (s) := y{s) if s strictly precedes (some element of) U, and let y u (s) := G fM s 
otherwise. If we stop the gamble process Sf ^ at the cut U, we readily infer from Eq. (6) 
that for the U -stopped process U { < ^ y ) 



We see that stopped gamble processes are gamble processes themselves, that correspond 
to selections being 'called off as soon as Reality reaches a cut. This also means that we 
can actually restrict ourselves to selections y that are t/-called off in Proposition 8. 

Proposition 8. Let t be a non-terminal situation, and let U be a cut of t. Then for any 
U -measurable t-gamble f, Iyf G if and only is there is some t-selection y such that 
I\tf > ^sC 7 > or equivalently, fy > ^jf. Consequently, 



If a f-gamble h is measurable with respect to the children cut C(t) of a non-terminal situa- 
tion t, then we can interpret it as gamble on W f . For such gambles, the following immediate 
corollary of Proposition 8 tells us that the predictive lower previsions P(h\t) are completely 
determined by the local modal 3% t . 

Proposition 9. Let t be a non-terminal situation, and consider a C(f)-measurable gamble 
h. Then P(h\t) = Pj(h). 

These results tells us that all predictive lower (and upper) previsions can be calculated 
using backwards recursion, by starting with the trivial predictive previsions P(f\£l) = 
P(f\Gl) — f for the terminal cut £1, and using only the local models This is illustrated 
in the following simple example. We shall come back to this idea in Section 8. 

Example 3. Suppose we have n > coins. We begin by flipping the first coin: if we 
get tails, we stop, and otherwise we flip the second coin. Again, we stop if we get tails, 
and otherwise we flip the third coin, ... In other words, we continue flipping new coins 
until we get one tails, or until all n coins have been flipped. This leads to the event tree 
depicted in Fig. 6. Its sample space is £1 = {t\,t2, ■ ■ ■ ,t n ,h n }. We will also consider the cuts 
U\ = {hM} of □, U 2 = {hM} ofh u U 3 = {t 3 ,h 3 } of h 2 , ■■ ., and U„ = {t„,h n } of h n -\. 
It will be convenient to also introduce the notation ho for the initial situation □. 



i. P{f\t) = P{P{f\u)\t); 

2- t{f\U)=P{P{f\V)\U). 



U{y^) = <S* and therefore, also using Eq. (1), % = % 



(11) 




Here too, it is implicitly assumed that all expressions are well-defined, e.g., that in the second statement, 
P(f\v) is a real number for all v eV, making sure that P(f\V) is indeed a gamble. 
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For each of the non-terminal situations hi c ,k = 0,l,...,n—l, Forecaster has beliefs (in 
□) about what move Reality will make in that situation, i.e., about the outcome of the k+ 1- 
th coin flip. These beliefs are expressed in terms of a set of really desirable gambles ^ 
on Reality's move space in Each such move space Wh k can clearly be identified 
with the children cut i4+i of h k . 

For the purpose of this example, it will be enough to consider the local predictive lower 
previsions P_ hk on Sf (t4 + i), associated with £%h k through Eq. (10). Forecaster assumes all 
coins to be approximately fair, in the sense that she assesses that the probability of heads 
for each flip lies between \ — 8 and \ + 8, for some < 8 < \. This assessment leads to 
the following local predictive lower previsions: 21 



P h (g) = {l-28) 



■28mm{g(h k+ i),g(t k+ i)}, (12) 



where g is any gamble on Ut+i- 

Let us see how we can for instance calculate, from the local predictive models P hk , the 
predictive lower probabilities P({h n }\s) for a gamble / on Q. and any situation s in the tree. 
First of all, for the terminal situations it is clear from Proposition 4. 1 that 

P({K}\tk) =0 and P({h n }\h n ) = 1. (13) 

We now turn to the calculation of P({h n }\h n -i). It follows at once from Proposition 9 that 
P_({h n }\h n -i) = E\_ l ({h n }), and therefore, substituting g — I^ hn y in Eq. (12) for k = n — 1, 

P({h n }\hn-i) = \-8. (14) 
To calculate P({h n }\h„-2), consider that, since h n -\ C U n -\, 

P({h n }\K-2) =P{P({hn}\U n -i)\h n - 2 ) =P hn _ 2 (P({ h »}\Un-l)) 

where the first equality follows from Theorem 7, and the second from Proposition 9, 
taking into account that g n -\ := P({h n }\U n -i) is a gamble on the children cut U n -\ of 
h n -2- It follows from Eq. (13) that g n -i(t n -i) = P({h n }\tn-i) = and from Eq. (14) that 
gn-\{K-i) = P({h n }\h„-i) = j-8. Substituting g = g n -i in Eq. (12) forfe = n-2, we 
then find that 

P({hn}\hn-2) = (\-8f. (15) 



21 

These so-called linear- vacuous mixtures, or contamination models, are the natural extensions of the proba- 
bility assessments P ht {{h k+i }) = \ — 5 and P hk ({h k+ \ }) = \ + 8; see [34, Chapters 3-4] for more details. 
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Repeating this course of reasoning, we find that more generally 

P({hn}\h k ) = (\-8)"- k , * = 0,...n-l. (16) 

This illustrates how we can use a backwards recursion procedure to calculate global from 
local predictive lower previsions. 22 

5. Interpretation of the Matching Theorem 

In Shafer and Vovk's approach, there sometimes also appears, besides Reality and Scep- 
tic, a third player, called Forecaster. Her role consists in determining what Sceptic's move 
space S f and gain function X t are, in each non-terminal situation t . Shafer and Vovk leave 
largely unspecified just how Forecaster should do that, which makes their approach quite 
general and abstract. 

But the Matching Theorem now tells us that we can connect their approach with Wal- 
ley's, and therefore inject a notion of belief modelling into their game-theoretic framework. 
We can do that by being more specific about how Forecaster should determine Sceptic's 
move spaces S t and gain functions Xt'. they should be determined by Forecaster's beliefs 
(in □) about what Reality will do immediately after getting to non-terminal situations /. 23 
Let us explain this more carefully. 

Suppose that Forecaster has certain beliefs, in situation □, about what move Reality 
will make next in each non-terminal situation t, and suppose she models those beliefs by 
specifying a coherent set 8% t of really desirable gambles on W f . This brings us to the 
situation described in the previous section. 

When Forecaster specifies such a set, she is making certain behavioural commitments: 
she is committing herself to accepting, in situation □, any gamble in 3& t , contingent on 
Reality getting to situation t, and to accepting any combination of such gambles according 
to the combination axioms D3, D4 and D5'. This implies that we can derive predictive 
lower previsions P(-\t), with the following interpretation: in situation □, P(f\t) is the 
supremum price Forecaster can be made to buy the t -gamble / for, conditional on Reality's 
getting to f, and on the basis of the commitments she has made in the initial situation □. 

What Sceptic can now do, is take Forecaster up on her commitments. This means that 
in situation □, he can use a selection which for each non-terminal situation t , selects a 
gamble (or equivalently, any non-negative linear combination of gambles) 5^(t) = h t in 3% t 
and offer the corresponding gamble Sf^ on Q. to Forecaster, who is bound to accept it. If 
Reality's next move in situation t is w e W r , this changes Sceptic's capital by (the positive 
or negative amount) — h t (w). In other words, his move space S r can then be identified with 
the convex set of gambles 3% t and his gain function X t is then given by Xi(h t , ■) = — h t . But 
then the selection y can be identified with a strategy 2? for Sceptic, and = — 
(this is the essence of the proof of Theorem 6), which tells us that we are led to a coherent 
probability protocol, and that the corresponding lower prices E, for Sceptic coincide with 
Forecaster's predictive lower previsions P(-\t). 

In a very nice paper [29], Shafer, Gillett and Scherl discuss ways of introducing and 
interpreting lower previsions in a game-theoretic framework, not in terms of prices that 
a subject is willing to pay for a gamble, but in terms of whether a subject believes she 
can make a lot of money (utility) at those prices. They consider such conditional lower 
previsions both on a contingent and on a dynamic interpretation, and argue that there is 



It also indicates why we need to work in the more general language of lower previsions and gambles, rather 
than the perhaps more familiar one of lower probabilities and events: even if we only want to calculate a global 
predictive lower probability, already after one recursion step we need to start working with lower previsions of 
gambles. More discussion on the prevision/gamble versus probability/event issue can be found in [34, Chapter 4]. 
23 

The germ for this idea, in the case that Forecaster's beliefs can be expressed using precise probability 
models on the f#(W f ), is already present in Shafer's work, see for instance [30, Chapter 8] and [25, Appendix 1]. 
We extend this idea here to Walley's imprecise probability models. 
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equality between them in certain cases. Here, we have decided to stick to the more usual 
interpretation of lower and upper previsions, and concentrated on the contingent/updating 
interpretation. We see that on our approach, the game-theoretic framework is useful too. 

This is of particular relevance to the laws of large numbers that Shafer and Vovk derive 
in their game-theoretic framework, because such laws can now be given a behavioural 
interpretation in terms of Forecaster's predictive lower and upper previsions. To give an 
example, we now turn to deriving a very general weak law of large numbers. 



6. A MORE GENERAL WEAK LAW OF LARGE NUMBERS 

Consider a non-terminal situation t and a cut U of t. Define the f-variable nu such 
that nu(co) is the distance d(t,u), measured in moves along the tree, from t to the unique 
situation u'mU that ft) goes through, nu is clearly {/-measurable, and n\j(u) is simply the 
distance d(t,u) from t to u. We assume that nu(u) > for all u G U, or in other words that 
U i= {t}. Of course, in the bounded protocols we are considering here, nu is bounded, and 
we denote its minimum by Nu- 

Now consider for each s between t and U a bounded gamble h s and a real number 
m s such that h s — m s £ &Z S , meaning that Forecaster in situation □ accepts to buy h s for 
m s , contingent on Reality getting to situation s. Let B > be any common upper bound 
for sup/ij — inf/ij, for all t C s C U. It follows from the coherence of &Z S [Dl] that m s < 
sup/ij. To make things interesting, we shall also assume that infh s < m s , because otherwise 
h s — m s > and accepting this gamble represents no real commitment on Forecaster's part. 
As a result, we see that \h s — m s \ < suph s — inf/z, < B. 

We are interested in the following t -gamble Gu, given by 



Gu = — £ l u [h, 



- m 



si J 



which provides a measure for how much, on average, the gambles h s yield an outcome 
above Forecaster's accepted buying prices m s , along segments of the tree starting in t and 
ending right before U. In other words, Gu measures the average gain for Forecaster along 
segments from t to U, associated with commitments she has made and is taken up on, 
because Reality has to move along these segments. This gamble Gu is ^/-measurable too. 
We may therefore interpret Gu as a gamble on U. Also, for any h s and any u € U, we know 
that because s\Zu,h s has the same value h s (u) := h s ((0(s)) in all (0 that go through u. This 
allows us to write 

(»)»&□« 

We would like to study Forecaster's beliefs (in the initial situation □ and contingent on 
Reality getting to t ) in the occurrence of the event 

{Gu > -£} := {ft> G ]t : Gu(co) > -e}, 

where e > 0. In other words, we want to know P({Gu > — e}|f), which is Forecaster's 
supremum rate for betting on the event that his average gain from t to U will be at least 
— £, contingent on Reality's getting to t . 



Theorem 10 (Weak Law of Large Numbers). For all e > 0, 

P{{Gu>s}\t)>l-cxp(- 



N v e 2 
4B 2 

We see that as Nu increases this lower bound increases to one, so the theorem can be very 
loosely formulated as follows: As the horizon recedes, Forecaster, if she is coherent, should 
believe increasingly more strongly that her average gain along any path from the present 
to the horizon won 't be negative. 
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This is a very general version of the weak law of large numbers. It can be seen as 
a generalisation of Hoeffding's inequality for martingale differences [14] (see also [38, 
Chapter 4] and [3 1 , Appendix A. 7]) to coherent lower previsions on event trees. 

7. Scoring a predictive model 

We now look at an interesting consequence of Theorem 10: we shall see that it can be 
used to score a predictive model in a manner that satisfies Dawid's Prequential Principle 
[5, 6]. We consider the special case of Theorem 10 where t — □. 

Suppose Reality follows a path up to some situation u a in U, which leads to an average 
gain Gu{u ) for Forecaster. Suppose this average gain is negative: Gu(u ) < 0. We see 
that \u a C {Gu < — e} for all < £ < —Gu(u ), and therefore all these events {Gjj < — e} 
have actually occurred (because \u has). On the other hand, Forecaster's upper probabil- 
ity (in □) for their occurrence satisfies P{{Gu < -e}) < exp(-^-), by Theorem 10. 
Coherence then tells us that Forecaster's upper probability (in □) for the event f h , which 
has actually occurred, is then at most Sn v ()tr(«o))> where 

„ , n ( N ,\ , . . Gu(u ) 

S N {x) = expl -— x I and yt/(u):= — - — . 

Observe that Ju{u ) is a number in [—1,0), by assumption. Coherence requires that Fore- 
caster, because of her local predictive commitments, can be forced (by Sceptic, if he 
chooses his strategy well) to bet against the occurrence of the event ]u () at a rate that is 
at least 1 — Sn v ("fij(u () )). So we see that Forecaster is losing utility because of her local 
predictive commitments. Just how much depends on how close "fij{u Q ) lies to — 1 , and on 
how large Nu is; see Fig. 7. 




The upper bound Snu(Yu( u o)) we have constructed for the upper probability of \u 
has a very interesting property, which we now try to make more explicit. Indeed, if we 
were to calculate Forecaster's upper probability P{\u () ) for \u () directly using Eq. (9), this 
value would generally depend on Forecaster's predictive assessments 0%$ for situations s 
that don't precede u , and that Reality therefore never got to. We shall see that such is not 
the case for the upper bound Su u {yu(u )) constructed using Theorem 10. 

Consider any situation s before U but not on the path through u , meaning that Reality 
never got to this situation s. Therefore the corresponding gamble h s — m s in the expression 
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for Gu isn't used in calculating the value of Gu{u ), so we can change it to anything else, 
and still obtain the same value of Gu{u )- 

Indeed, consider any other predictive model, where the only thing we ask is that the 
8%' s coincide with the 3% s for all s that precede u . For other s, the 38! s can be chosen 
arbitrarily, but still coherently. Now construct a new average gain gamble G' v for this 
alternative predictive model, where the only restriction is that we let h' s = h s and m' s = m s 
if s precedes u . We know from the reasoning above that G' v (u ) = Gu(u ), so the new 
upper probability that the event \u a will be observed is at most 

s Nu (^)=s Nu (^)=s Nu (M»o)). 

In other words, the upper bound Sn{Yu(u)) we found for Forecaster's upper probability of 
Reality getting to a situation u depends only on Forecaster's local predictive assessments 
£%s for situations s that Reality has actually got to, and not on her assessments for other 
situations. This means that this method for scoring a predictive model satisfies Dawid's 
Prequential Principle; see for instance [5, 6]. 



8. Concatenation and backwards recursion 

As we have discovered in Section 4.5, Theorem 7 and Proposition 9 enable us to cal- 
culate the global predictive lower previsions P(-\t) in imprecise probability trees from lo- 
cal predictive lower previsions P s , s □ t , using a backwards recursion method. That this 
is possible in probability trees, where the probability models are precise (previsions), is 
well-known, 24 and was arguably discovered by Christiaan Huygens in the middle of the 
17-th century. 25 It allows for an exponential, dynamic programming-like reduction in the 
complexity of calculating previsions (or expectations); it seems to be essentially this phe- 
nomenon that leads to the computational efficiency of such machine learning tools as, for 
instance, Needleman and Wunsch's [21] sequence alignment algorithm. 

In this section, we want to give an illustration of such exponential reduction in com- 
plexity, by looking at a problem involving Markov chains. Assume that the state X(n) of a 
system at consecutive times n = l,2,...,N can assume any value in a finite set 9E . Fore- 
caster has some beliefs about the state X ( 1 ) at time 1 , leading to a coherent lower prevision 
P_i on ( £(3£). She also assesses that when the system jumps from stateX(n) = x n to a new 
state X(n+1), where the system goes to will only depend on the state X (n ) the system was 
in at time n, and not on the states X(k) of the system at previous times k=l,2,...,n — 1. 
Her beliefs about where the system mX(n) =x„ will go to at time n + 1 are represented by 
a lower prevision P Xn on &(5I>). 

The time evolution of this system can be modelled as Reality traversing an event tree. 
An example of such a tree for 3£ — {a,b} and N = 3 is given in Fig. 8. The situations of 
the tree have the form (pa,...,Xk) € 3£ k , k = 0, 1 , . . . , AT; for k = this gives some abuse 
of notation as we let JT° := {□}. In each cut X k := 3£ k of □, the value X(k) of the state 
at time k is revealed. 

This leads to an imprecise probability tree with local predictive models £ n := P t and 



See Chapter 3 of Shafer's book [27] on causal reasoning in probability trees. This chapter contains a number 
of propositions about calculating probabilities and expectations in probability trees that find their generalisations 
in Sections 4.3 and 4.5. For instance, Theorem 7 generalises Proposition 3.11 in [27] to imprecise probability 
trees. 
25 

See Appendix A of Shafer's book [27]. Shafer discusses Huygens's treatment of a special case of the so- 
called Problem of Points, where Huygens draws what is probably the first recorded probability tree, and solves the 
problem by backwards calculation of expectations in the tree. Huygens's treatment can be found in Appendix VI 
of [15]. 
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FIGURE 8 . The event tree for the time evolution of system that can be in 
two states, a and b, and can change state at each time instant n = 1,2,3. 
Also depicted are the respective cuts X 1 and X 2 of □ where the state at 
times 1 and 2 are revealed. 

expressing the usual Markov conditional independence condition, but here in terms of 
lower previsions. For notational convenience, we now introduce a (generally non-linear) 
transition operator T on the linear space Sf as follows: 

or in other words, T(f) is a gamble on S£ whose value T(f)-x in the state x E 3£ is given 
by Pjc(f). The transition operator T completely describes Forecaster's beliefs about how 
the system changes its state from one instant to the next. 

We now want to find the corresponding model for Forecaster's beliefs (in □) about the 
state the system will be in at time n. So let us consider a gamble /„ on 2£ N that actually 
only depends on the value X(n) of X at this time n. We then want to calculate its lower 
prevision £(/„):=£(/„!□). 

Consider a time instant k £ {0, 1 , . . . ,n — 1}, and a situation (x\ , . . . ,Xk) <E 3f k . For 
the children cut C(x\, . . . ,Xk) :— {{x\,. . . ,Xk,Xk + \): x^ + \ £ of (xi, . . . ,Xk), we see that 
P{fn\C(xi, . . . ,Xk)) is a gamble that only depends on the value of X(k + 1) in 9E , and 
whose value inx^+i is given by P(f n \xi, . . . ,^+1). We then find that 

a/«|xi,...,x^)-aa/«l^i,--- I ^))ki,--- I ^)=^( : P(/«|C(x 1 ,...,x i ))), (18) 

where the first equality follows from Theorem 7, and the second from Proposition 9 and 
Eq. (17). We first apply Eq. (18) for k = n- 1. By Proposition 5.2, P{f„\C{x\ , . . . ,x„_i)) = 
/„, so we are led to P(f„\xi, . . . ,x„_i) =P JCn _ l {fn) = T(f„)-x n -u and therefore 

p(/„|c(x 1 ,...,x„_ 2 ))-r(/„). 

Substituting this in Eq. (18) for k = n — 2, yields P(f n \xi,. . . ,x„_2) = P^ _ 2 {T(f n )), and 
therefore 

P(/«|C(x 1 ,...,x„_ 3 )) = 7 ,2 (/„). 
Proceeding in this fashion until we get to k = 1, we get P(/„|C(D)) = T n ^ 1 (f„), and going 
one step further to k = 0, Eq. (18) yields £(/„!□) = P □(£(/„ |C(D))) and therefore 

P(fn)=P l (T n - 1 (fn))- (19) 

We see that the complexity of calculating P{fn) m this way is essentially linear in the 
number of time steps n. 

In the literature on imprecise probability models for Markov chains [2, 17, 32, 33], 
another so-called credal set, or set of probabilities, approach is generally used to calculate 
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£(/«)• The point we want to make here is that such an approach typically has a worse 
(exponential) complexity in the number of time steps. To see this, recall [34] that a lower 
prevision P on Sf(JT) that is derived from a coherent set of really desirable gambles, 
corresponds to a convex closed set of probability mass functions p on JT, called a 

credal set, and given by 

Jt{P):={p: (y 8 e^(^))P(g)<E p (g)} 

where we let E p (g) := Y,xe3£ p{ x )g{ x ) be the expectation of the gamble g associated with 
the mass function p; E p is a linear prevision in the language of Section 3.2. It then also 
holds that for all gambles g on 3C , 

= mm {Ep(g) '■ p £ (P)} = rnin {E p (g) : p G ext^ (P)} 

where ext^#(P) is the set of extreme points of the convex closed set Ji(P). Typically 
on this approach, ext(^#(P)) is assumed to be finite, and then is called a finitely 

generated credal set. See for instance [3, 4] for a discussion of credal sets with applications 
to Bayesian networks. 

Then P(f n ) can also be calculated as follows: 26 Choose for each non-terminal situation 
t= (x h ...,x k ) G SC k , k = 0,l,...,n-l a mass function p, in the set JtiPf) given by 
Eq. (17), or equivalently, in its set of extreme points ext This leads to a (precise) 

probability tree for which we can calculate the corresponding expectation of /„. Then 
P(fn) is the minimum of all such expectations, calculated for all possible assignments of 
mass functions to the nodes. We see that, roughly speaking, when all ^{P,) have a typical 
number of extreme points M, then the complexity of calculating P(fn) will t> e essentially 
N n , i.e., exponential in the number of time steps. 

This shows that the 'lower prevision' approach can for some problems lead to more 
efficient algorithms than the 'credal set' approach. This may be especially relevant for 
probabilistic inferences involving graphical models, such as credal networks [3, 4]. An- 
other nice example of this phenomenon, concerned with checking coherence for precise 
and imprecise probability models, is due to Walley et al. [37]. 

9. Additional Remarks 

We have proved the correspondence between the two approaches only for event trees 
with a bounded horizon. For games with infinite horizon, the correspondence becomes 
less immediate, because Shafer and Vovk implicitly make use of coherence axioms that 
are stronger than D1-D4 and D5', leading to lower prices that dominate the corresponding 
predictive lower previsions. Exact matching would be restored of course, provided we 
could argue that these additional requirements are rational for any subject to comply with. 
This could be an interesting topic for further research. 

We haven't paid much attention to the special case that the coherent lower previsions 
and their conjugate upper previsions coincide, and are therefore (precise) previsions or fair 
prices in de Finetti's [11] sense. When all the local predictive models P, (see Proposition 9) 
happen to be precise, meaning that Pj(f) = Pt(f) = —£(—/) f° r a ll gambles / on W f , 
then the immediate prediction model we have described in Section 4 becomes very closely 
related, and arguably identical to, the probability trees introduced and studied by Shafer 
in [27]. Indeed, we then get predictive previsions P(-\s) that can be obtained through 
concatenation of the local modals P t , as guaranteed by Theorem 7. 27 

Moreover, as indicated in Section 8, it is possible to prove lower envelope theorems 
to the effect that (i) the local lower previsions P, correspond to lower envelopes of sets 



An explicit proof of this statement would take us to far, but it is an immediate application of Theorems 3 
and 4 in [20]. 

97 

A 'This should for instance be compared with Proposition 3. 1 1 in [27]. 
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^ t of local previsions P t ; (ii) each possible choice of previsions P, in over all non- 
terminal situations t, leads to a compatible probability tree in Shafer's [27] sense, with 
corresponding predictive previsions P(-\s); and (iii) the predictive lower previsions P(-\s) 
are the lower envelopes of the predictive previsions P(-\s) for the compatible probability 
trees. Of course, the law of large numbers of Section 6 remains valid for probability trees. 

Finally, we want to recall that Theorem 7 and Proposition 9 allow for a calculation of the 
predictive models P(-\s) using only the local models and backwards recursion, in a manner 
that is strongly reminiscent of dynamic programming techniques. This should allow for a 
much more efficient computation of such predictive models than, say, an approach that 
exploits lower envelope theorems and sets of probabilities/previsions. We think that there 
may be lessons to be learnt from this for dealing with other types of graphical models, such 
as credal networks [3, 4], as well. 

What makes this more efficient approach possible is, ultimately, the Marginal Extension 
Theorem (Theorem 3), which leads to the Concatenation Formula (Theorem 7), i.e., to 
the specific equality, rather than the general inequalities, in Proposition 2.7. Generally 
speaking (see for instance [34, Section 6.7] and [20]), such marginal extension results can 
be proved because the models that Forecaster specifies are local, or immediate prediction 
models: they relate to her beliefs, in each non-terminal situation t, about what move Reality 
is going to make immediately after getting to t . 
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Appendix A. Proofs of main results 

In this Appendix, we have gathered proofs for the most important results in the paper. 

We begin with a proof of Proposition 2. Although similar results were proved for 
bounded gambles by Walley [34], and by Williams [40] before him, our proof also works 
for the extension to possibly unbounded gambles we are considering in this paper. 

Proof of Proposition 2. For the first statement, we only give a proof for the first two in- 
equalities. The proof for the remaining inequality is similar. For the first inequality, 
we may assume without loss of generality that inf{<» 6 B: /(©)} > — °° and is there- 
fore a real number, which we denote by j3. So we know that I B (f — /3) > and therefore 
h{f - J3) G 2&, by D2. It then follows from Eq. (5) that j3 < P(f\B). To prove the second 
inequality, assume ex absurdo that P(f\B) < P(f\B), then it follows from Eqs. (4) and (5) 
that there are real a and j3 such that J3 < a, I B (f - a) G 3% and / B (/3 - /) G S%. By D3, 
hifi -a) = I B (f-a)+I B (fi -/) G M, but this contradicts Dl, since / B (/3 - a) < 0. 

We now turn to the second statement. As announced in Footnote 11, we may assume 
that the sum of the terms P_{f\ \B) and P{f2\B) is well-defined. If either of these terms is 
equal to — the resulting inequality then holds trivially, so we may assume without loss 
of generality that both terms are strictly greater than — °°. Consider any real a < P_(f\ \B) 
and J3 < P{fi\B), then by Eq. (5) we see that both I B {f\ - a) G M and I B (f 2 
Hence h[(f\ +/ 2 ) - (a + J3)] G M, by D3, and therefore P(/i +f 2 \B) > a + J3, using 
Eq. (5) again. Taking the supremum over all real a < P{f\ \B) and j3 < P{fi\B) leads to 
the desired inequality. 
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To prove the third statement, first consider A > 0. Since by D4, I B (Xf — a) G 3$ if and 
only if 7g(/ — a/A) e we get, using Eq. (5) 

P{Xf\B) = sup {a : 7 B (A/ - a) 6 M} = sup {Aj3 : I B {f - J3) e ^} = XP(f\B) . 

For A = 0, consider that P(0\B) = sup {a : - I B a e 3?} = 0, where the last equality fol- 
lows from Dl and D2. 

For the fourth statement, use Eq. (5) to find that 

P(/+a|B) = sup{j3: 7 B (/ + a - j3) e ^} = sup{a + y: I B (f - y) G^} = a + P{f\B). 

The fifth statement is an immediate consequence of the first. 

To prove the sixth statement, observe that f\ < / 2 implies that 7s(/ 2 — fi) > and 
therefore 7g(/ 2 — /i) G 3S, by D2. Now consider any real a such that I B {f\ — a) G then 
by D3, 7 B (/ 2 - a) = I B (fi - a)+I B (f 2 -fi) G Hence 

{a : 7 B (/j - a) G 31} C {a : 7 B (/ 2 - a) G ^} 

and by taking suprema and considering Eq. (5), we deduce that indeed P(fi \B) < P(/ 2 \B). 

For the final statement, assume that P{f\C) is a real number for all C€ J. Also observe 
that P(f\D) = P(/Id\D) for all non-empty D. Define the gamble g as follows: g((o) := 
P(f\C) for all 0)eC, where Cef. We have to prove that P(g\B) < P(f\B). We may 
assume without loss of generality that P(g\B) > — °° [because otherwise the inequality 
holds trivially]. Fix e > 0, and consider the gamble I B (f — g + e). Also consider any 
CeJ. lfCCB\henI c I B (f-g + E)=I c (f-P(f\C) + e)eM, using Eq. (5). IfCn7i = 
then again lch{f — g + e) = G 8%, by D2. Since 3$ is J?-conglomerable, it follows that 
h{f-g + e) G M, whence P{f - g\B) > -£, again using Eq. (5). Hence P{h\B) > 0, 
where h\— f — g. Consequently, 

P(f\B) = P(h + g\B) > P(h\B) +P(g\B) > P(g\B), 

where we use the second statement, and the fact that P(g\B) > — °° and P(h\B) > implies 
that the sum on the right-hand side of the inequality is well-defined as an extended real 
number. □ 

Proof of Theorem 3. We have already argued that any coherent set of really desirable gam- 
bles that includes 3%, must contain all gambles Sf^ [by D3 and D5']. By D2 and D3, it 
must therefore include the set If we can show that is coherent, i.e., satisfies D1-D4 
and D5', then we have proved that $@ is the natural extension of 3%. This is what we now 
set out to do. 

We first show that Dl is satisfied. It clearly suffices to show that for no selection 5?, it 
holds that < 0. This follows at once from Lemma 12 below. 

To prove that D2 holds, consider the selection J^q := 0, then Sf"^ = 0, and if / > it 
follows that / > Z?^ whence indeed / G <%. 

To prove that D3 and D4 hold, consider any f\ and / 2 in Sag, and any non-negative real 
numbers a\ and a 2 . We know there are selections 5f\ and £^2 such that f\ > and 
h > But a\£f\ +02-5^2 is a selection as well [because the M t satisfy D3 and D4], 

andSf"i- y i +a 2' y 2 =a\<S^ + a 2 Sf^ 2 < a x f\ +a 2 f 2 , whence indeed a x f x +a 2 f 2 G 

To conclude, we show that D5' is satisfied. Consider any cut U of □. Consider a gamble 
/and assume fhat7^ M / G Sg> for all m G £/. We must prove that / G Let U t :=UnCl and 
£/„ ( := t/ \Q, so U is the disjoint union of U t and U nt . For o G f/ ? , I^f = 7^ £0 /(fi)) G 
implies that /(©) > 0, by Dl. For m G U nt , we invoke Lemma 13 to find that there is some 
M-selection Sf u such that I^ u f > Sf n ". Now construct a selection ,y as follows. Consider 
any s in Q.® \Q.. If m C s for some [unique, because U is a cut] m G t/„(, let =5^(*) := S^ u {s). 
Otherwise let ,y(s) := 0. Then 

^= I /T«^" < L W< Lw = /, 
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so indeed / G the first equality can be seen as immediate, or as a consequence of 
Lemma 11, and the second inequality holds because we have just shown that /(ft)) > for 
all (0 el/,. The rest of the proof now follows from Lemma 13. □ 

Lemma 11. Let t be any non-terminal situation, and let U be any cut oft. Consider a 
t -selection y, and let, for any u GU\Q., y u be the u-selection given by S^ u {s) = y{s) if 
the non-terminal situation s follows u, and y u (s) := otherwise. Moreover, let y u be the 
U -called off t -selection for J? (as defined after Theorem 7). Then 

<= E v^(k)+ E hJP y («)+<-] 

ueunsi ueu\n 

E i*tf'=*f+ E ■ 

ueu\n ueu\a 

Proof. It is immediate that the second equality holds; see Eq. (11) for the third. For the 
first equality, it obviously suffices to consider the values of the left- and right-hand sides in 
any co G \u for u GU\£l. The value of the right-hand side is then, using Eqs. (6) and (7), 

<s* («)+<-(©)= e y(s)( u )+ £ y(s)(co)= e ^)(«) □ 

OHM mCico rCico 

Lemma 12. Consider any non-terminal situation t and any t-selection S". Then it doesn 't 
hold that 'Sq < (on f f ). As a corollary, consider any cut U of t, and the gamble on 
U defined by (u) — & y (u). Then it doesn 't hold that <£jf <0(onU). 

Proof. Define the set Py := {s G £2^ \ Q: t Q s and y(s) > 0}, and its (relative) comple- 
ment Ny := {s G £2° \ £2: t C s and ^(j) £ 0}. If A/> = then Sfjf > 0, by Eq. (7), so 
we can assume without loss of generality that Ny is non-empty. Consider any minimal 
element fi of Ny, meaning that there is no s in Ny such that s C fi [there is such a mini- 
mal element in Ny because of the bounded horizon assumption]. So for all t C s C fi we 
have that «5^(i) > 0. Choose Wi in W (l such that ^(^(wi) > [this is possible because 
3$t x satisfies Dl]. This brings us to the situation ti := fiWi. If ?2 S Ny, then choose W2 
in W, 2 such that y(t2){yii) > [again possible by Dl]. If tj_ G then we know that 
y(fi)(yi-i) > for any choice of W2 in W t2 . We can continue in this way until we reach 
a terminal situation fi) = t\W\W2 ■ ■ ■ after a finite number of steps [because of the bounded 
horizon assumption]. Moreover 

&a (<») = E -^(0 (<*>(')) + L-^('*)(w*) > + ^i)(wi) + > 0. 

(Dl ft 

It therefore can't hold that < (on |f). 

To prove the second statement, consider the f/-called off f-selection y derived from 
y by letting y u (s) :~ y(s) if s (follows t and) strictly precedes some u in U, and zero 

otherwise. Then & y ' (u) — Lrc.?c M =^ (*)(«) = ^jf^C®) f° r a U © that g° through h, where 
u GU [see also Eq. (11)]. Now apply the above result for the t -selection y u . □ 

Lemma 13. Consider any non-terminal situation t and any gamble f. Then f t f G if 
and only if there is some t-selection y t such that Iyf>^ a ' ( on ]t ). 

Proof. It clearly suffices to prove the necessity part. Assume therefore that Ipf G 
meaning [definition of the set that there is some selection y such that f t f > '. 
Let y be the f-selection defined by letting y(s) := y(s) if t C s, and zero otherwise. It 
follows from Lemma 1 1 [use the cut of □ made up of t and the terminal situations that do 
not follow t ] that 

hf>9g=hW + 9*]+ E 

co'tft 
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whence, for all (i)£fl, 

Sfjf(fl>)<0, co^t (20) 

Sf-^O </(£»), (21) 

Then, by (21), the proof is complete if we can prove that 'S^ {t) > 0. Assume ex absurdo 
that it) < 0. Consider the cut of □ made up of t and the terminal situations that don't 
follow t. Applying Lemma 12 for this cut and for the initial situation □, we see that there 
must be some CO G £1 \ T* such that S^f (ft)) > 0. But this contradicts (20). □ 

Proof of Proposition 4. For the first statement, consider a terminal situation CO and a gam- 
ble / on £1 Then |fi) = {a>} and therefore 7f ffl (/- a) = /{«}(/(©) - 05) G if and 
only if a < /(co), by Dl and D2. Using Eq. (8), we find that indeed P{f\co) = /(©), By 
conjugacy,P(/|(») = -P(-/|fl)) = -(-/(ft))) = /(to) as well. 

For the second statement, use Eq. (8) and observe that 7jf (/ — 05 ) = Ipifly — «)• The 
last statement is an immediate consequence of the second and Proposition 2.6. □ 

Proof of Proposition 5. The first statement follows from Eq. (8) if we observe that Iy (Iy — 
a) =7 T ,(1 -a) G if and only if a < l.byDl andD2. 

For the second statement, consider any m G J7, then we must show that P(g\u) = guiu). 
But the {7-measurability of g tells us that I\ u {g — cx) = I-[ u {gu{u) — Of), and this gamble 
belongs to $<% if and only if a < gu (u), by Dl and D2. Now use Eq. (8). 

The proofs of the third and fourth statements are similar, and based on the observation 
that 7 T „(/ + g - a) = I ]u {f + g v {u)-a) and I ]u {gf-a) = I ]u (gu(u)f-a). □ 

Proof of Theorem 6. First, consider an immediate prediction model M t , t G OP \£2. Define 
Sceptic's move spaces to be S r := 2% t and his gain functions \ : S r x W r by Xt{h,vi) := 
— h(w) for all h&M t and w e W f . Clearly PI and P2 are satisfied, because each M t is a 
convex cone by D3 and D4. But so is the coherence requirement C. Indeed, if it weren't 
satisfied there would be some non-terminal situation t and some gamble h in £% t such that 
h(w) < for all w in W ( , contradicting the coherence requirement Dl for £% t . We are 
thus led to a coherent probability protocol. We show there is matching. Consider any 
non-terminal situation f, and any f-selection S?. For all terminal situations co □ t , 

E y{u){co(u))= £ -^(y(u),co(u)) = -^f(co), 

or in other words, selections and strategies are in a one-to-one correspondence (are actually 
the same things), and the corresponding gamble and capital processes are each other's 
inverses. It is therefore immediate from Eqs. (3) and (9) that E, = P(-\t). 

Conversely, consider a coherent probability protocol with move spaces S t and gain func- 
tions Xt : S t x W r for all non-terminal t. Define 8&' t := {— A ( (s, •) : s G S r }. By a similar ar- 
gument to the one above, we see that£'(-|f) = where the P'(-\t) are the predictive lower 
previsions associated with the sets 38! v But each 3%' t is a convex cone of gambles by PI and 
P2, and by C we know that for all non-terminal situations t and all gambles h in 3&' t there 
is some w in W f such that h(w) > 0. This means that the conditions for Lemma 14 are 
satisfied, and therefore also E!(-\t) — P(-\t), where the P_(-\t) are the predictive lower previ- 
sions associated with the immediate prediction model 8& t that is the smallest convex cone 
containing all non-negative gambles and including {— X t (s, •) + 8 : s G S f , 8 > 0}. □ 

Lemma 14. Consider, for each non-terminal situation t G £2^ \ Q., a set of gambles 8%' t on 
Wf such that ( i) {%[ is a convex cone, and ( ii)for all h G &' t there is some w in W f such that 
h(vr) > 0. Then each set := {a(h + 8) +/: h G 8 > 0,f > 0, a > 0} is a coherent 
set of really desirable gambles on W f . Moreover, all predictive lower previsions obtained 
using the sets 2% t coincide with the ones obtained using the 
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Proof. Fix a non-terminal situation f. We first show that 32 t is a coherent set of really 
desirable gambles, i.e., that D1-D4 are satisfied. Observe that M t is the smallest convex 
cone of gambles including the set {h + 8 : /i£ M' t , 8 > 0} and containing all non-negative 
gambles. So D2-D4 are satisfied. To prove that Dl holds, consider any g < and assume 
ex absurdo that g £ 3% t . Then there are h in SH[, 8 > 0, / > and a > such that > g = 
a(h + 8) + /, whence a(ft + 5) < and therefore a > and /i + 8 < 0. But by (ii), there 
is some w in W t such that h(w) > 0, whence h(w) + 8 > 0. This contradicts h + 8 < 0. 

We now move to the second part. Consider any gamble / on £2. Fix f in Qfi \ £2 and 
e > 0. First consider any f-selection 5?' associated with the l%' 5 , i.e., such that y'{s) G !%' s 
for all s □ t . Since Reality can only make a finite and bounded number of moves, whatever 
happens, it is possible to choose 8 S > for each non-terminal s □ / such that LfCsco 5 S < £ 
for all © in Q. that follow f. Define the f-selection associated with the S% s by y(s) := 
^"(i) + 5 S e for all non-terminal s that follow t . Clearly <S^ < e + <S^' , and therefore 

E!(f\t) = supsupja: f-a>&£' } < supsupja: /-a + e>^f) 
= supsupja: /- a > } + e < P(f\t) + £■ 

Since this inequality holds for all e > 0, we find that P'(f\t) < P(f\t). 

Conversely, consider any f-selection 5? associated with the M s . For all s □ f, we have 
that there are h s mM' s , 8 S > 0, f s > and a s > such that y(s) = a s (h s + S s )+f s . Define 
the f-selection .9" associated with the <%' s by 9"{s) := CC s h s = y{s) - a s 8 s - f s < ^{s). 
Clearly then also < , and therefore 

P{f\t) = supsupja:/- a >^f)< supsupja: /- a > ^f') < ?(f\t). 

This proves that indeed P'(f\t) = P(f\t). □ 

Proof of Theorem 7. It isn't difficult to see that the second statement is a consequence of 
the first, so we only prove the first statement. 

Consider any f -gamble / on Q.. Recall that it is implicitly assumed that P(f\U) is again 
a f-gamble. Then we have to prove that P(f\t) = P(P(f\U)\t). Let, for ease of notation, 
g :=P(f\U), so the f-gamble g is {/-measurable, and we have to prove that P(f\t) =P(g\t). 
Now, there are two possibilities. 

First, if f is a terminal situation ft), then, on the one hand, P(f\t) — /(©) by Proposi- 
tion 4.1. On the other hand, again by Proposition 4.1, 

P(g\t)=g(co)=P(f\U)(co). 

Now, since U is a cut of f = ft), the unique element u of U that f = ft) goes through, is 
m = ft), and therefore P(f\U) (ft)) =P{f\(0) = /(ft)), again by Proposition 4. 1 . This tells us 
that in this case indeed P(f\t) = P(g\t). 

Secondly, suppose that f is not a terminal situation. Then it follows from Proposition 2.7 
and the cut conglomerability of that P{f\t) >P{P{f\U)\t)=P(g\t) [recall that£(-|f) = 
£(-|T0 an d t nat P(-\U) = P(-\£$u)]- It therefore remains to prove the converse inequality 
E-{f\t) < E(g\t)- Choose e > 0, then using Eq. (9) we see that there is some /-selection S" 
such that / — P(f\t) + £> on all paths that go through f . Invoke Lemma 11, using the 
notations introduced there, to find that 

f-P(f\t)+e>9f+ E / T „^f" (onTO. (22) 
ueu\a 

Now consider any u eU. If u is a terminal situation ft), then by Proposition 4.1, g(u) = 
P(f\co) = /(ft)), and therefore Eq. (22) yields 

g(a)-P(f\t) + e>^ u (co), (23) 
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also taking into account that — [see Eq. (11)]. If u is not a terminal situation 
then for all co 6 ]u, Eq. (22) yields 

f{co)-P(f\t) + e > Sfjf(«) + <»(©), 

and since S" u is a M-selection, this inequality together with Eq. (9) tells us that P(f\u) > 
P{f\t) -£ + <gff{u), and therefore, for all CO <G ]u, 

g{(0)-P{f\t) + e>^ V {(0). (24) 

If we combine the inequalities (23) and (24), and recall Eq. (9), we get that P(g\t) > 
P(f\t)-E. Since this holds foralle>0, we may indeed conclude thatP(g|f) >P(f\t). □ 

Proof of Proposition 8. The condition is clearly sufficient, so let us show that it is also 
necessary. Suppose that Iyf G <#^», then there is some f-selection 5? such that / > Sfjj" ', 
by Theorem 3 [or Lemma 13]. Define, for any u e U\£l, the selection ,5^ u as follows: 
y u (s) := y{s) if s 3 u and =5^ M (i) := elsewhere. Then, by Lemma 11, 

wE£/\£2 

Now fix any u in U. If m is a terminal situation CO, then it follows from the equality above 
that 

f v (u) =/(£») >Sfjf(«). 
If w is not a terminal situation, we get for all © G T M: 

fv(u)=f(co)>&f («)+3f-(©), 

whence, by taking the supremum of all o e T M , 

fu(u) > Sfjf («) + sup 3jf»(fl>) > Sfjf («), 
a>eT« 

where the last inequality follows since sup ffle | u < S'^ U (co) > by Lemma 12 [with t = u and 
y = y u l Now recall that f v > &jf(u) is equivalent to I ]t f > [see Eq. (11)]. □ 

Proof of Theorem 10. This proof builds on an intriguing idea, used by Shafer and Vovk in 
a different situation and form; see [30, Lemma 3.3]. 

Because \h s — m s \ < B for all f C s C h, it follows that Gu(u) > —B, and it therefore 
suffices to prove the inequality for e < B. We work with the upper probability P(Af e \t ) of 
the complementary event Af £ := {Gu < — £}■ It is given by 

inf | a : a - ^ff > / A c £ for some f-selection ^ j . (25) 

Because Gj/ is ^/-measurable, we can (and will) consider A£ e as an event on U. In the 
expression (25), we may assume that a > 0, Indeed, if we had a < and a — > I&c for 

some f-selection <y, then it would follow that < a < 0, contradicting Lemma 12. Fix 
therefore a > Oand 8 > and consider the selection^ such that .Y(s) :=X s (h s — m s ) £ fM s 
for all t C s C f/ and let ^(s) be zero elsewhere. Here 

A s :=aS J] [l + 5(m v -A v (j))] = o5 J] [l + 8{m v -h v (u))}, (26) 

where u is any element of U that follows 5. Recall again that —B < h s — m s < B, so if we 
choose 5 < jg, we are certainly guaranteed that X s > and therefore indeed X s {h s — m s ) 6 
S% s . After some elementary manipulations we get for any u GU and any CO £ t M: 

^f(°>) = L {h s {u)-m s )X i ,= £ {h s {u) - m s )a8 f~[ [1 + 5(m v - fe v (i<))] 

tc.snu rc.scu rCvcs 
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where the second equality follows from Eq. (26). [The &q is f/-measurable.] If we let 
E, s := m s — h s {u) for ease of notation, then we get 

3jf («) = -a £ 5& n [1 + S£ v ] = a£ J] t 1 + -« E 11 t 1 + 8 ^ 

= a-a [] [l + 5^ v ] = a-a ]~[ [l + 8(m v -h v (u))} 

for all u in £/. Then it follows from (25) that if we can find an a > such that 

a J] [\ + 8{m v -h v {u)))\ > 1 

whenever u belongs to A c t e , then this a is an upper bound for P(A L t e \t). By taking loga- 
rithms on both sides of the inequality above, we get the equivalent condition 

lna+ £ ln[l + 8 {m s -h s (u))] > 0. (27) 

Since ln(l +x) >x — x 2 for x > — \, and 8{m s —h s (u)) > SB > — | by our previous 
restrictions on 5, we find 

ln[l + 5(mj-/i s (i<))] > £ 8(m s -h s (u))- £ [5(m s -/z,(w))] 2 

>5 ^ [OT. v -/!. s ( M )]-5 2 n£/( M )B 2 

= nu(u)8 [-Gu(u)-B 2 S] . 
But for all m G A^ e , — Gjj(u) > £, so for all such m 

52 ln[l +S(m s -h s (u))] > nu(u)S(e - B 2 8). 

If we therefore choose a such that for all u eU, \na + nu(u)8{e -B 2 S) > 0, or equiva- 
lently a > exp(—nu(u)S(e —B 2 8 j), then the above condition (27) will indeed be satisfied 
for all u G A, e , and then a is an upper bound for P{A c t e \t). The tightest (smallest) upper 
bound is always (for all u£U) achieved for 8 = Replacing ny by its minimum Ny 

allows us to get rid of the M-dependence, so we see that P(A^ e \t) < exp(— ^gr)- We pre- 
viously required that 8 < jg, so if we use this value for 8, we find that we have indeed 
proved this inequality for e < B. □ 
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